home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Gold Medal Software 3
/
Gold Medal Software - Volume 3 (Gold Medal) (1994).iso
/
archive
/
cx201e.arj
/
CXSUB.DOC
< prev
next >
Wrap
Text File
|
1994-03-01
|
12KB
|
318 lines
CXSUB routines
--------------------------------------------------------------------------
As you know, Cx provides a very low level interface to data compression.
Many application designers, however, may be able to use a higher level
interface. The CXSUB routines provide a high level, application
independent interface to Cx data compression. The CXSUB routines have
been carefully designed to allow easy integration into existing
applications. You may be able to use the CXSUB routines in your
applications, but if not, they may be instructive in explaining
the usage of Cx.
The Source Code
--------------------------------------------------------------------------
Source code for the CXSUB routines is found in the files:
CXSUB.C - C source code
CXSUB.H - C header file
CXSUB.PAS - Turbo Pascal source code
VBCXSUB.BAS - Visual BASIC source code
Programming Interface
--------------------------------------------------------------------------
CXSUB Error Codes
------------------------------------------------------------------
CXSUB_ERR_OPENS - Could not open source.
CXSUB_ERR_OPEND - Could not open destination.
CXSUB_ERR_NOMEM - Insufficient memory.
CXSUB_ERR_READ - Could not read from source.
CXSUB_ERR_WRITE - Could not write to destination.
CXSUB_ERR_CLOSE - Could not close destination.
CXSUB_ERR_INVALID - source file is invalid or corrupt
cx_error_message(error)
------------------------------------------------------------------
PURPOSE:
Return an English error string from a Cx or CXSUB error.
PARAMETER:
error - error code (CX_ERR* or CXSUB_ERR*)
RETURN:
An English error message, or "unknown" if the error code is
unknown.
cx_compress_file(dst, src, method, bsize, tsize)
------------------------------------------------------------------
PURPOSE:
Compress any size or type of file to another file.
PARAMETERS:
dst - destination file name
src - source file name
method - Compression method (CX_METHOD*)
bsize - compression buffer size (1-CX_MAX_BUFFER)
tsize - temporary buffer size (CX_C_MINTEMP-CX_D_MINTEMP)
RETURN:
CX_ERR_* - Cx error.
CXSUB_ERR_* - CXSUB error.
0 - No error.
NOTES:
For maximum compression specify bsize and tsize as large as possible.
See section 'CXSUB Single File Compression' for more information.
cx_decompress_file(dst, src)
------------------------------------------------------------------
PURPOSE:
Decompress a file compressed with cx_compress_file.
PARAMETERS:
dst - destination file name
src - source file name
RETURN:
CX_ERR_* - Cx error.
CXSUB_ERR_* - CXSUB error.
0 - No error.
NOTES:
If dst is not specified (NULL in C, '' in Pascal, "" in Visual
BASIC), an integrity check only will be performed.
See section 'CXSUB Single File Compression' for more information.
cx_compress_ofile(ofile, ifile, method, bsize, tsize)
------------------------------------------------------------------
PURPOSE:
Compress any size or type of file to another file, with files
previously opened.
PARAMETERS:
ofile - opened output file
ifile - opened input file
method - Compression method (CX_METHOD*)
bsize - compression buffer size (1-CX_MAX_BUFFER)
tsize - temporary buffer size (CX_C_MINTEMP-CX_D_MINTEMP)
RETURN:
CX_ERR_* - Cx error.
CXSUB_ERR_* - CXSUB error.
0 - No error.
NOTES:
For maximum compression specify bsize and tsize as large as possible.
See section 'CXSUB Single File Compression' for more information.
cx_decompress_ofile(dst, src)
------------------------------------------------------------------
PURPOSE:
Decompress a file compressed with cx_compress_(o)file, with
files previously opened.
PARAMETERS:
ofile - opened output file
ifile - opened input file
RETURN:
CX_ERR_* - Cx error.
CXSUB_ERR_* - CXSUB error.
0 - No error.
NOTES:
See section 'CXSUB Single File Compression' for more information.
CXSUB Single File Compression (SFC)
--------------------------------------------------------------------------
This section contains general and language specific information about
the following CXSUB functions:
cx_compress_file - file name interface
cx_decompress_file - file name interface
cx_compress_hfile - file handle interface
cx_decompress_hfile - file handle interface
Overview
---------------------------------------------------------------------
The CXSUB Single File Compression (SFC) routines provide an easy way to
compress and decompress one file to another.
There are two interfaces. One is based on file names. Using this
interface is not much harder than specifying:
"Compress file A to file B" or
"Decompress file B to file C"
Of course, the decompression routine will only work on files compressed
with the compression routine. The other interface is based on file
handles. A file handle is simply a way to reference an open file.
This interface is provided to allow for future routines based on the
SFC routines. It is possible, for example, to design an archive file
format that uses the handle based interface.
All of the provided SFC source code writes and reads the same file
format.
File Format
---------------------------------------------------------------------
The file format is a sequence of variable length 'blocks'. Blocks are
produced by reading data from a file to be compressed. The amount of
data read in each pass is known here as the 'original buffer size' or
BSIZE.
If, for example, you are compressing a 1000 bytes file, and BSIZE is
100 bytes, 10 blocks will be produced. BSIZE is a parameter to the
file compression routines (parameter bsize).
A block has 4 pieces of information:
2 bytes - original buffer size (BSIZE)
2 bytes - compressed buffer size (CSIZE)
2 bytes - 16 bit CRC (from CX_CRC) (DATACRC)
CSIZE bytes - (DATA)
The relation between these 4 pieces of information is:
if BSIZE is the same as CSIZE, the original buffer could not be
compressed. DATA contains uncompressed data.
if BSIZE is not the same as CSIZE, the original buffer was successfully
compressed. CSIZE will be strictly less than BSIZE. DATA contains
compressed data.
DATACRC is a 16 bit CRC computed on DATA. Note that this means
DATACRC is computed on compressed data.
To indicate the end of a compressed file, an abbreviated block is stored.
The abbreviated block is simply:
2 bytes - original buffer size (0)
As an example, compressing a 25 byte file with BSIZE equal to 10, where:
bytes 0...9 compress to 7 bytes
bytes 10..19 can't be compressed
bytes 20..25 compress to 2 bytes
The file data produced from the SFC compression routines will be:
------------------------------
2 bytes - 10 block 1
2 bytes - 7
2 bytes - DATACRC
7 bytes - compressed data
------------------------------
2 bytes - 10 block 2
2 bytes - 10
2 bytes - DATACRC
10 bytes - uncompressed data
------------------------------
2 bytes - 5 block 3
2 bytes - 2
2 bytes - DATACRC
2 bytes - compressed data
------------------------------
2 bytes - 0 block 4, Abbreviated end of file block
Of course, you would typically use a BSIZE much larger than 10 bytes.
For maximum compression, you would use a BSIZE of CX_MAX_BUFFER.
Motivation / Questions / Expanding or Improving the SFC routines
---------------------------------------------------------------------
The following questions and answers may provide insight into the
SFC functions.
Q: Why are bsize and tsize parameters? For maximum compression,
bsize should always be CX_MAX_BUFFER and tsize should always be
CX_C_MAXTEMP.
A: Some applications want or need to minimize memory usage. By
keeping bsize and tsize parameters, the application can balance
memory usage and compression size.
Q: Why are both BSIZE and CSIZE stored?
A: By storing both, it is possible to handle uncompressable data.
If BSIZE is equal to CSIZE, the stored buffer is known to be
uncompressed.
Q: Why is a CRC computed on the compressed buffer as opposed to the
original buffer?
A: Testing has determined that a CRC on compressed buffers is better
able to detect errors than a CRC on original buffers. In addition,
as compressed buffers are typically smaller than original buffers,
a CRC on a compressed buffer is quicker to compute.
Q: Why is the last block abbreviated?
A: Simply to save space. By abbreviating the final block, it is
possible to save 4 bytes of storage for each compressed file.
Note, however, that this is a fairly arbitrary decision. As
file I/O calls consume time, it may be desirable to store a
'complete' block. This would eliminate up to 2 file I/O calls
per block when decompressing.
Q: Why isn't the compression method stored?
A: CX_DECOMPRESS can decompress any buffer compressed with CX_COMPRESS
without knowing beforehand the specific compression method used.
Q: What if I wanted to store an original files time stamp and/or
name in a compressed file?
A: This would be a fairly easy addition. You could add a header
to the SFC file format. As an example:
4 bytes - files time stamp
1 byte - name length (NAMELEN)
NAMELEN bytes - file name
The cx_compress_file and cx_decompress_file functions could be
modified to write, read and use this header information. The only
additional routines you would have to call (included in most
languages) are for reading and writing a files time stamp.
Q: What if I wanted to extract valid data from a corrupt compressed
file?
A: This could be accomplished by expanding a block. Instead of:
2 bytes - original buffer size (BSIZE)
2 bytes - compressed buffer size (CSIZE)
2 bytes - 16 bit CRC (from CX_CRC) (DATACRC)
CSIZE bytes - (DATA)
You could specify:
4 bytes - header like '$CX$'
4 bytes - physical file location (POS)
2 bytes - original buffer size (BSIZE)
2 bytes - compressed buffer size (CSIZE)
2 bytes - 16 bit CRC (from CX_CRC) (DATACRC)
CSIZE bytes - (DATA)
With a corrupt file, you could search the file for the block header
($CX$). After finding a header, you would have all the information
you need to extract a valid original buffer. If there are errors
when decompressing a block, you would know it is invalid. Note
that smaller BSIZE's will have more potential for recovery as each
block will effect less data.